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What is the best description that we can construct of a thermodynamic system that is not in 
equilibrium, given only one, or a few, extra parameters over and above those needed for a description 
of the same system at equilibrium? Here, we argue the most appropriate additional parameter is the 
non-equilibrium entropy of the system, and that we should not attempt to estimate the probability 
distribution of the system, but rather the metaprobability (or hyperensemble) that the system 
is described by a particular probability distribution. The result is an entropic distribution with 
two parameters, one a non-equilibrium temperature, and the other a measure of distance from 
equilibrium. This dispersion parameter smoothly interpolates between certainty of a canonical 
distribution at equilibrium and great uncertainty as to the probability distribution as we move away 
from equilibrium. We deduce that, in general, large, rare fluctuations become far more common as 
we move away from equilibrium. 

PACS numbers: 05.70.Ln, 05.40.-a 



Consider a gas confined to a piston, as illustrated in 
figure n The realization on the left was sampled from 
thermal equilibrium with a fixed plunger. To describe 
the probability of every single possible configuration of 
the particles we only need to know the Hamiltonian of 
the system and the temperature of the environment Q . 
On the other hand, the system on the right has been 
sampled from a non-equilibrium ensemble. Although the 
Hamiltonian is the same, the plunger has recently been 
in violent motion, and this perturbation has driven the 
ensemble away from equilibrium. To describe the config- 
urational probability we now need to know the entire past 
history of perturbations that the system has undergone. 
The dynamics and historical details matter. 

This example illustrates the essentially difficultly we 
face when trying to directly extend equilibrium statistical 
mechanics out of equilibrium. There is only one ensemble 
that can describe a given system in thermal equilibrium, 
but there are a multitude of ways that the same system 
can be out-of-equilibrium. The constraint that the equi- 
librium entropy is maximized is a very strong condition. 
However, let us take a step back, and reflect that statis- 
tical mechanics itself is designed to circumvent a similar 
difficulty. In classical mechanics we typically assume that 
we know the exact microstate of the system. However, in 
statistical mechanics, we recognize that often such a de- 
tailed description is neither possible nor desirable. A few 
bulk measurements or parameters do not provide nearly 
enough information to fix the microstate. Instead we con- 
tent ourselves with calculating the probability that the 
system occupies a particular microstate. To ask what the 
state of the system is, rather than what it could be, is to 
ask an unnecessarily difficult question. 

Out-of-equilibrium we essentially face the same prob- 
lem, compounded. Clearly we cannot obtain enough in- 
formation from a few measurements to determine the mi- 
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FIG. 1: Schematic realizations of a gas confined to a piston 
in and out of equilibrium. 



croscopic state of the system, but if the system is out 
of equilibrium then a few parameters or measurements 
are not sufficient (in general) to determine the ensem- 
ble either. Therefore, perhaps the correct approach is 
not to try to determine what the probability distribu- 
tion of the system is, but instead attempt to determine 
what the probabilities could be. In other words, instead 
of thinking about an ensemble of systems, we instead 
envisage an ensemble of ensembles, a 'hyperensemble', 
where each member of the hyperensemble has the same 
instantaneous Hamiltonian, but is described by a differ- 
ent probability distribution. We seek a generic descrip- 
tion of the typical non-equilibrium ensemble given a few 
parameters or measurements that describe the average 
behavior of the hyperensemble. 

This basic approach is borrowed from Bayesian statis- 
tics, where it is not uncommon to estimate the proba- 
bility of a probability density (a 'metaprobability') when 
the available data is too sparse to reliable estimate the 
probability directly H, IE Q - Reference |3| contains a lu- 
cid description of this procedure in the context of amino 
acid sequence profiles. The hyper- prefix is also borrowed 
from Bayesian statistics, were it is usual to talk about hy- 
perpriors (a prior distribution of a prior distribution) and 
associated hyper-parameters. 

With this insight, we can move beyond the standard 
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canonical ensemble by changing the question. Instead of 
trying to find the probability distribution 9 of the system 
directly, we instead estimate the metaprobability P{9), 
the probability of the microstate probability distribution. 
We proceed analogously to the maximum entropy deriva- 
tion of equilibrium statistical mechanics 0,0. We will 
find the probability distribution of ensembles P{9) that 
maximizes the entropy Ti of the hyperensemble. 



n 



[P{9) 



P{9)\og^d9 



(1) 



while maintaining certain appropriate constraints. Here, 
m{9) is a measure on the space of probability distribu- 
tions. It acts as a prior and ensures that this entropy is 
invariant under a change of variable. 

The trick to maximum entropy methods is finding the 
appropriate constraints, since with an arbitrary choice of 
constraint and prior practically any answer can be man- 
ufactured. To avoid this trap, we seek a minimal set 
of physically and mathematically reasonable parameters. 
Clearly, the hyperensemble must be normalized. 



1 = J P{9) d9 . 



(2) 



And, by analogue with the canonical ensemble, we should 
constrain the mean energy of the ensemble of ensembles. 
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Thus far, we have incorporated the same informa- 
tion and constraints that lead to the canonical ensem- 
ble, namely the density of energy states, normalization 
and mean energy. To move beyond the canonical en- 
semble we require a measure of how far the system is 
from equilibrium. After all, the quintessential feature of 
non-equilibrium systems is that they are not in equilib- 
rium. What is the most appropriate measure? If the 
system were in equilibrium, then the entropy would be 
maximized given the constraints. It follows that out- 
of-equilibrium the entropy of the ensemble is not maxi- 
mized, and moreover, the entropy cannot be determined 
with any certainty from a measurement of the mean en- 
ergy alone. Therefore, the entropy itself can be used as 
an additional, physically relevant constraint. 



(5) = / P{9) 



d9 
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To summarize, we will maximize the entropy of the hy- 
perensemble (Eq. ^ subject to normalization, the mean 
energy and the mean ensemble entropy (Eq. • The 
solution to this problem is found by introducing Lagrange 
multiplies {A} and then applying the calculus of variation 
in the usual way: 



(5) 




FIG. 2: The entropic distribution (Eq. I^J over 2 states, (a) 
Reference distribution p — (0.5,0.5), A = 0,1,2,4,8 (Broad 
to peaked) (b) A = 4, pi = 0.05, 0.20, 0.35, 0.5, 0.65, 0.80, 0.95 
(left to right), (c) p = (0.1,0.95), A = 0.5,1,2,4,8. (d) 
Same p, log scale, A = 0.5, 1, 2, 4, 8, 16, 32, 64, 128. Note that 
the reference distribution controls the mode and that as the 
dispersion parameter A approaches the distributions become 
broader and the mean moves towards i. 



Some manipulation will illuminate the significance of 
this expression. Let us rewrite with Aq = logZ, Ai — A/3 
and A2 = A. 



Pi9) 



r{9) 



exp 



+ X'^9^ log( 



(6) 

In the absence of any compelling evidence to the contrary, 
we will assume a uniform, uninformative prior over prob- 
abilities, to(0) oc constant. The parameter P has units 
of entropy per unit of energy and is effectively an inverse 
temperature. Therefore, we can naturally introduce a 
canonical ensemble with the same effective temperature. 
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Pi 



exp(-/3£;,), 



(7) 



and rewrite the maximum entropy hyperensemble as 



Pi9) 



Z(/3,A) 



exp 



(8) 



It is now evident that our hyperensemble has the func- 
tional form of the entropic distribution, a probability of 
probabilities that occasionally occurs in Bayesian statis- 
tics 0, 0, 0, 0i ^1 ■ This same functional form also ap- 
pears as the asymptotic limit of the multinomial distri- 
bution with large sample sizes in large deviation 
theory 0,0], and as the natural conjugate prior of the 
Dirichlet distribution. 

The entropic distribution over a binary state space is 
illustrated in fig. 13 and with a Gaussian reference (e.g. 
a particle in a harmonic potential) in fig. O We see that 
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as A decreases the dispersion of the probability distribu- 
tions increases, the mean distribution moves away from 
the canonical distribution, the average probability of rare 
states increases, and the probability of common states 
decreases to compensate. Moreover, in fig. Owe see that 
A controls a crossover in behavior; if p > A^^ then the 
uncertainty in 9 and the bias away from equilibrium are 
relatively small, whereas for rare states, p < , the 
perturbation are large. Therefore, the generic, predicted 
behavior is that rare events typically (but not necessar- 
ily) become far more common as the condition of thermal 
equilibrium is relaxed. 

We can deduce some important properties of the hy- 
perensemble by noting that the function in the exponen- 
tial of Eq. IHl is the relative entropy of 9 to the reference 
canonical distribution, p 

D{9\\p)=J29,\og^ (9) 

This is a natural measure of how distinguishable one dis- 
tribution is from another. Since the relative entropy is 
zero if the distributions are identical, and positive if they 
are not, it immediately follows that the mode of the en- 
tropic distribution is located at the reference p. In other 
words, the single most probable distribution of the hy- 
perensemble is a canonical distribution controlled by the 
effective temperature /?, and the dispersion of the hyper- 
ensemble about that mode is controlled by the inverse 
scale parameter A. If A is very large the hyperensemble 
collapses to a single point at the mode and we recover the 
canonical ensemble of equilibrium statistical mechanics. 
It follows that the reference temperature is numerically 
equal to the conventional temperature of the same sys- 
tem with the same mean energy at thermal equilibrium. 
As A decreases the dispersion increases and typical dis- 
tributions differ significantly from the reference, until at 
A = every distribution in equally likely. 

Another way of looking at the canonical hyperensemble 
is to note that the relative entropy of to a canonical 
reference p can be interpreted as a generalized free energy 
difference [is). 

D{9\\p) = f3F{0)-mp). (10) 

i i 

Since p is canonical F{p) = S/f3 — (E) is the Helmholtz 
free energy, whereas F{9) can be interpreted as a gener- 
alized, non-canonical free energy. Using these definitions, 
the canonical hyperensemble can be written as 

P{d)^exp{~X/3[F{9)~F{p)]} . (11) 

The physical picture is that near thermal equilibrium the 
ensemble that maximizes the free energy dominates the 
hyperensemble. As we move away from equilibrium the 
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FIG. 3: The entropic distribution (Eq. (HJ with a Gaussian 
reference distribution (zero mean, unit variance) and disper- 
sion A = 100. The dashed line is the reference p, the points 
are a single Monte-Carlo sample of 6, and the solid line is the 
mean distribution (9) . Note that the variation of 6 away from 
the reference is relatively large for intrinsically rare states, 
p < 1/A. 

free energy is no longer necessarily maximized. Rather 
the probability of obtaining a particular ensemble out 
of equilibrium is determined by the generalized free en- 
ergy difference between that ensemble and the reference 
canonical ensemble. This expression is pleasingly remi- 
niscent of the thermodynamic fluctuation representation 
of standard statistical mechanics 0| , except we are now 
looking at fluctuations in ensemble rather than state. 

We can also derive the entropic hyperensemble by di- 
rectly constraining the mean relative entropy {D{9\\p)). 
From the viewpoint of information theory, this is the av- 
erage penalty for encoding states of the system assum- 
ing the they are drawn from the reference distribution 
p rather than the true distributions This mea- 

sure is very similar to the Jensen-Shannon divergence 
{D{9\\{9))) 01 except that the reference distribution is 
the mode, rather than the mean of 9. 

Currently, various modifications or extensions of 
Boltzmann-Gibbs statistics are being investigated, in- 
cluding Tsallis statistics (Which modifies the entropic 
function) [l^ and maximum entropy production (Which 
modifies the constraints) |l7j . Perhaps the most similar 
approach to the present work is superstatistics 0, the 
central idea of which is that a system may be locally in 
equilibrium (either in time or space), but globally out- 
of-equilibrium. Therefore, the system as a whole can be 
described by a mixture of canonical ensembles, each with 
a different local temperature. In contrast, the compo- 
nents of the maximum entropy hyperensemble are not 
required to be canonical. The essentially difficulty with 
superstatistics is that the distribution of effective temper- 
atures is unconstrained. It is therefore interesting to ask 
what distribution of local temperature would maximize 
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the hyperentropy given that the members of the hyper- 
ensemble are canonical? Since the result will depend on 
the density of states, let us explore a simple, but impor- 
tant, special collection of harmonic oscillators. 
The partition function is Q{/3) = /3^'^ and therefore the 
mean energy scales as (E) = c//3, where the constant 
'c' is proportional to the size of the system. An obvious 
choice for the prior is m(T) oc 1/T p. Plugging these 
relations into Eq. |H| we find 

P(r)oc(|,ye-^^/^°, (12) 

where T is the effective local temperature and T° — 1/(3 
is the reference temperature. Here, with the hyperensem- 
ble approach we predict that if the system is linear and 
locally in equilibrium, then the temperature fluctuations 
follow a gamma distribution 0, 0| with mean T° 
and standard deviation T° / \/cA. If the temperature fluc- 
tuations are not gamma distributed, then either the sys- 
tem is not linear, not in local equilibrium, or we have 
failed to incorporate some important, pertinent informa- 
tion about the system 3]. 

It is worth noting that wc would have obtained very 
different results if wc had chosen different constraints. 
As previously mentioned, this is the essential weakness 
of maximum entropy methods; we must rely on the plau- 
sibility of the constraints, rather than the rigor of the 
derivation. In particular, if we maximize the hyperen- 
tropy given the mean relative entropy of the reference p 
to the ensemble 6, {D{p\\9)), we obtain a Dirichlet distri- 
bution. This in turn leads to the prediction that the local 
temperature of a linear system follows an inverse gamma 
distribution, which is known to be equivalent to the non- 
extensive thermodynamics of Tsallis 16, 18] . This is an 
intriguing connection, but unfortunately {D{p\\9)) has no 
immediately obvious deep physical or information theo- 
retic significance. 

In this paper, I have argued that a natural way of 
moving beyond equilibrium Boltzmann-Gibbs statistics 
is to change the question: Instead of trying to determine 
what the probability distribution of a system is, we in- 
stead ask what the probability distribution could be. We 
seek an ensemble of ensembles that captures the generic 
properties of matter generically out-of-equilibrium. The 
solution to this problem is found by maximizing the en- 
tropy of the hyperensemblc, given the mean energy and 
mean ensemble entropy. This yields a physically plau- 
sible description of fiuctuations away from equilibrium. 



a natural definition of temperature out-of-equilibrium, a 
natural measure of distance away from equilibrium, and 
the intuitively plausible prediction that rare events typi- 
cally become far more common as a system moves away 
from thermal equilibrium. 
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